1218 stories
·
0 followers

What’s the deal with Alzheimer’s disease and amyloid?

1 Share

At the end of last month, a scientific journal pulled a research paper on Alzheimer's disease.

The retraction came from Neurobiology of Aging, which removed a 2011 paper claiming to show that a version of a protein called amyloid-β was responsible for memory loss in Alzheimer's disease. On its own, that might not seem notable; bad papers can make it through peer review and are only caught after publication.

But this wasn't an isolated case. Over the past few years, multiple studies arguing that amyloid-β is the central driver of Alzheimer's disease have been retracted. Some scientists have even been indicted for fraud over the issue. All the while, none of the drugs targeting this protein and its pathway have had any real clinical effect.

Why does this keep happening?

Plaques and tangles

The medical condition we currently call Alzheimer's disease was first identified in 1906, after a neuropathologist named Alois Alzheimer examined brain tissue from the autopsy of Auguste Deter, a dementia patient he had been treating. Deter was just 55 when she died, much younger than most dementia patients. Alzheimer noted that her brain tissue contained plaques, which had previously been seen in other dementia patients, as well as tangles of nerve fibers, which had not.

For the next 80 years, that was about as much as we knew about this condition that robs sufferers of their memories, skills, and personalities. And until very recently, it was only possible to diagnose it post-mortem by examining the brain for those plaques and tangles. The advent of PET scanners and the discovery of biomarkers in blood have changed that.

It wasn't until 1984 that we identified amyloid-β accumulating in the plaques of people with Alzheimer's. Scientists weren't really sure what amyloid-β did, but another study found plenty of it in the brains of people with Down syndrome, who often suffer from dementia later in life. In fact, the gene that encodes amyloid-β—or more accurately, for an upstream molecule called amyloid precursor protein—is found on chromosome 21, and the signature of Down syndrome is an extra copy of that. Raising suspicions further, in 1987, patients with a familial case of Alzheimer's were found to have a mutation in their amyloid protein precursor gene.

Something was causing amyloid-β to be cut off from its precursor, then clump together, in people with dementia. If only we could stop it from aggregating or remove the aggregates from the brain, we could stop the disease, the conventional wisdom held.

In 2006, this idea looked even better, as a paper was published in Nature showing that memory loss was associated with a specific form of amyloid-β buildup outside of neurons.

Stay on target

A potential therapeutic target gave scientists something to aim for. As with so many other poorly understood, complex diseases, they set about studying it in mice. But like so many of those other complex diseases that afflict humans, mice don't naturally get Alzheimer's. They do if you insert a mutated copy of the human APP gene into their genome, however. Armed with this early mouse model, scientists got to work.

In 1999, Elan Pharmaceuticals created a vaccine to a particular part of amyloid-β and then showed that mice would clear plaques from their brains after treatment with the vaccine. Better still, it worked whether the vaccine was given to very young mice, before the plaques could form, or to older mice where the plaques were already present.

Vaccines work by prompting the body to produce antibodies against whatever the vaccine recognizes. So a few years later, Elan went on to show that anti-amyloid antibodies also cleared plaques in the transgenic mice's brains when given directly.

But there's many a slip between mouse and man. Elan tried its vaccine in human patients suffering mild to moderate Alzheimer's disease but had to suspend the trial of 360 patients after a number developed brain inflammation. While Elan's vaccine didn't go anywhere, other pharmaceutical companies and biotechs were still on the case.

Trial after trial failed to arrest or reverse the disease, no matter the approach. Targeting different parts of the amyloid-β pathway also created side effects aplenty, some of them life-threatening or fatal. Regardless, amyloid-β remained the preferred target. Eventually, in 2021, the Food and Drug Administration approved an antibody called aducanumab, made by Biogen.

To call the approval controversial would be an understatement. Aducanumab had failed not one but two large double-blind, placebo-controlled phase III trials in 2019. Eventually, its makers scoured the data sets a little more, claiming to find a small reduction in amyloid-β plaque size and a small cognitive improvement in a particular group of participants.

Many scientists were outraged by the approval, and their outrage looked justified once we saw how the drug would be marketed: with a cognitive test that no one could pass. A congressional inquiry into aducanumab's approval found it was "rife with irregularities." But at $65,000 per patient per year, the drug represented a potential $18 billion-a-year revenue stream for Biogen.

Aducanumab was approved by the FDA in June 2021. But by early July, the regulator had already narrowed the set of people it would allow the drug to be given to, restricting it to just patients with a mild form of the disease. Biogen ended up losing money on it and removed the drug from the market in January 2024.

But only so it could concentrate on another amyloid-β-targeting antibody, this one developed with a biotech company called Eisai. This therapy, called lecanemab, was half the price of aducanumab, at $26,500 per year, and it got the nod from the FDA in 2023. There were plenty of questions about the approval because, yet again, there was very little data indicating that patients were getting any better. And there were still nasty side effects; three patients died from brain swelling and hemorrhaging.

Another antibody targeting amyloid-β, called donanemab, made headlines in 2023 when its maker, Eli Lilly, published trial data that claimed to slow the progression of the disease "by about 35 percent in the early stages." Again, this came with the risk of severe side effects like brain swelling and bleeding. Those side effects may have been the only way to tell someone was on the drug, given that it provided extremely mild cognitive benefits.

Surely we've had some other ideas?

We're now more than 40 years on from the identification of amyloid-β as the bad stuff in plaques and 30 years from being able to clear amyloid-β from the brains of mice (and, more recently, humans). Yet doing that is more likely to make an Alzheimer's patient's brain bleed than it is to restore cognitive function or even meaningfully slow its decline. But it's not like we haven't had other ideas.

Take inflammation, for instance. Brains aren't just made of neurons; they're surrounded by glial cells, some of which envelop the neuronal junctions. Some of these glia are similar in ways to macrophages, a kind of immune cell that goes a little haywire in heart disease and some other conditions. In 2008, a small-scale study showed that the arthritis drug etanercept, which inhibits an inflammatory cytokine called TNF-α, caused a rapid improvement in cognitive function for Alzheimer's patients.

The only hitch? The drug needed to be infused directly into the spinal column. A larger trial that used etanercept injections under the skin didn't run into any of the horrible side effects of the amyloid-β antibodies, but it also failed to show any real clinical benefit.

To others in the scientific community, the trigger for that inflammation is likely to be infection. Our immune system uses cytokines like TNF-α to fight infections, in addition to other chemicals like peroxynitrite, which causes oxidative stress, all of which is associated with inflammation.

Neuropathologists have identified viral infections in plaques, and a group from Tufts recently proposed a mechanism by which herpes simplex virus-1 could be driving the disease. But many other viruses have also been implicated; data-mining samples from biobanks in the UK and Finland found infections from several different viruses were associated with an increased risk of Alzheimer's disease (as well as other neurological disorders), with the most striking correlation being viral encephalitis.

Even influenza infection was associated with a five-fold increase in the risk of developing Alzheimer's. But again, the data is equivocal and a little confusing. In that study, the risk of developing Alzheimer's was greatest at one year after infection and then decreased over time. But we know that the disease takes decades to progress.

Bacterial infections have also seen scrutiny. Porphyromonas gingivalis is an anaerobic bacterium that's one of the main culprits of gum disease, and it has been linked to a range of common diseases, including things like atherosclerosis and—you guessed it—Alzheimer's. The idea is that p. gingivalis enters the bloodstream through abrasions in the mouth and then reaches the brain; the response causes the plaques and tangles to form.

Still other research has suggested a role for our gut microbiome, the vast collection of microbes that help us digest food—more recently, we've discovered they do so, so much more. Here, tantalizingly, there are other hints of therapeutic targets. For example, foods that reduce inflammation, such as those high in fiber or omega-3 fatty acids, may be neuroprotective, in addition to being good for your heart. And a variant of the APOE4 gene that results in high levels of LDL cholesterol is also associated with an increased risk of Alzheimer's.

The problem with any of these hypotheses is that many, many more people will be infected with a virus or bacteria that has been implicated in Alzheimer's than will ever develop the disease. Two-thirds of people under 50 have HSV-1, for example, and they won't all get Alzheimer's. The same goes for people with gum disease or an influenza infection. Perhaps the disease requires multiple different pathogens to be sparked?

More likely, each of these can insult the brain and trigger plaque formation, but only in combination with other factors. Recently, a role for lithium deficiency has looked rather compelling.

The Amyloid Mafia

We would almost certainly know a lot more about those other potential causes had it not been for the so-called Amyloid Mafia. Scientists aren't immune to groupthink, and the people responsible for deciding who got research grants and who didn't have not been at all receptive to proposals that investigate non-amyloid mechanisms.

"You were just lucky when you weren’t beaten up by the amyloid-β or tau people if you would mention immunology," said Michael Heneka, a neuroinflammation specialist interviewed by Nature in 2023. (Tau is another Alzheimer ’s-associated protein.)

Speaking to American Public Media, the former director of Alzheimer's research at the National Institute of Aging said, "It became gradually an infallible belief system. So everybody felt obligated to pay homage to the idea without questioning. And that's not very healthy for science when scientists... accept an idea as infallible. That's when you run into problems."

To make matters worse, it turned out that much of that confidence in amyloid-β as the one true cause was built on fake data.

That landmark 2006 Nature paper that claimed to show that a specific form of amyloid-β was the culprit causing the disease? It was retracted in 2024 after it emerged that the authors had faked some of the data, copy-pasting images of protein detections. In another case, a scientist at City University of New York was indicted last year for falsifying data that helped support the ideas behind an Alzheimer's drug being developed by Cassava Sciences. (For a more comprehensive look at the Amyloid Mafia, check out Charles Pillar's work.) Sadly, this kind of scientific misconduct is more common than we'd like and can be hard to detect before publication.

Those FDA drug approvals have also been tainted. In addition to the aforementioned congressional investigation that found irregularities, the head of FDA's neuroscience office was forced to step down in 2023 after it was found that he had an inappropriately close relationship with Biogen.

Despite this litany of clinical failures and research misconduct, it would be a stretch to say that the amyloid hypothesis is dead. Only one of the five FDA-approved therapies is independent of the amyloid pathway, and while work is conducted on other areas, amyloid-β research remains the lion's share.

Read full article

Comments



Read the whole story
Share this story
Delete

Websites that hijack your back button must stop by June 15 or face Google's wrath

1 Share

So you thought you'd just read that webpage and then go back to the previous page? A bold assumption. All too often, clicking the back button in your browser doesn't actually take you back. It's called back button hijacking, and Google has thus far tolerated it. That ends in June, when the company will designate it a "malicious practice," and any site continuing to do it will face consequences.

Back button hijacking is a way of wringing more pageviews out of visitors. It's common on sites that live and die on search traffic. You may end up on a page because it looks like something you want, but instead of letting you leave the domain, it manipulates your page history to insert something else when you click back.

The phantom page is usually a collection of additional content suggestions or a pop-up that tries to eke out a few more clicks from each visitor. Some sites get a little more creative with it, though. For example, LinkedIn has a nasty habit of sending you "back" to the social feed after you land on a link to a profile or job posting.

Google says the back button should always do what you expect it to do—go back. Anything else amounts to a deceptive user experience that can discourage users from visiting unfamiliar pages in the future. The company isn't inventing a new rule to address this behavior, which is apparently on the rise. Google will simply be more broadly enforcing the malicious practices policy, which says in part:

Malicious practices create a mismatch between user expectations and the actual outcome, leading to a negative and deceptive user experience, or compromised user security or privacy.

Sites that have been using back button hijacking are now under the gun to end the practice. Starting on June 15, 2026, sites using back button hijacking could be hit with either automated or manual anti-spam actions. That can result in a much lower page rank in search, which is a problem for sites that have traditionally relied on search traffic to stay afloat.

Google says that any site that uses back button hijacking should spend the next two months eliminating the practice. The early warning ensures they'll have a chance to get it done. While some websites have designed their own systems to do this, others have back button hijacking as a consequence of a third-party library or advertising stack. Whatever the origin of the hijack, sites will want to get it sorted out before the deadline to avoid a spam designation.

Read full article

Comments



Read the whole story
Share this story
Delete

Thousands of Rare Concert Recordings Are Landing On the Internet Archive

1 Share
A Chicago concert superfan Aadam Jacobs who has recorded more than 10,000 shows since the 1980s is working with Internet Archive volunteers to digitize the collection before the cassettes deteriorate. "So far, about 2,500 of these tapes have been posted on the Internet Archive, including some rare gems like a Nirvana performance from 1989," reports TechCrunch. From the report: For many of these recordings, Jacobs was using pretty mediocre equipment, but the volunteer audio engineers working with the Internet Archive have made these tapes sound great. One volunteer, Brian Emerick, drives to Jacobs' house once a month to pick up more boxes of tapes -- he has to use anachronistic cassette decks to play the tapes, which get converted into digital files. From there, other volunteers clean up, organize, and label the recordings, even tracking down song names from forgotten punk bands. The archive is available here.

Read more of this story at Slashdot.

Read the whole story
Share this story
Delete

How do you add or remove a handle from an active Wait­For­Multiple­Objects?, part 2

2 Shares

Last time, we looked at adding or removing a handle from an active Wait­For­Multiple­Objects, and we developed an asynchronous mechanism that requests that the changes be made soon. But asynchronous add/remove can be a problem bcause you might remove a handle, clean up the things that the handle was dependent upon, but then receive a notification that the handle you removed has been signaled, even though you already cleaned up the things the handle depended on.

What we can do is wait for the waiting thread to acknowledge the operation.

_Guarded_by_(desiredMutex) DWORD desiredCounter = 1;
DWORD activeCounter = 0;

void wait_until_active(DWORD value)
{
    DWORD current = activeCounter;
    while (static_cast<int>(current - value) < 0) {
        WaitOnAddress(&activeCounter, &current,
                      sizeof(activeCounter), INFINITE);
        current = activeCounter;
    }
}

The wait_until_active function waits until the value of active­Counter is at least as large as value. We do this by subtracting the two values, to avoid wraparound problems.¹ The comparison takes advantage of the guarantee in C++20 that conversion from an unsigned integer to a signed integer converts to the value that is numerically equal modulo 2ⁿ where n is the number of bits in the destination. (Prior to C++20, the result was implementation-defined, but in practice all modern implementations did what C++20 mandates.)²

You can also use std::atomic:

_Guarded_by_(desiredMutex) DWORD desiredCounter = 1;
std::atomic<DWORD> activeCounter;

void wait_until_active(DWORD value)
{
    DWORD current = activeCounter;
    while (static_cast<int>(current - value) < 0) {
        activeCounter.wait(current);
        current = activeCounter;
    }
}

As before, the background thread manipulates the desiredHandles and desiredActions, then signals the waiting thread to wake up and process the changes. But this time, the background thread blocks until the waiting thread acknowledges the changes.

// Warning: For expository purposes. Almost no error checking.
void waiting_thread()
{
    bool update = true;
    std::vector<wil::unique_handle> handles;
    std::vector<std::function<void()>> actions;

    while (true)
    {
        if (std::exchange(update, false)) {
            std::lock_guard guard(desiredMutex);

            handles.clear();
            handles.reserve(desiredHandles.size() + 1);
            std::transform(desiredHandles.begin(), desiredHandles.end(),
                std::back_inserter(handles),
                [](auto&& h) { return duplicate_handle(h.get()); });
            // Add the bonus "changed" handle
            handles.emplace_back(duplicate_handle(changed.get()));

            actions = desiredActions;

            if (activeCounter != desiredCounter) {
                activeCounter = desiredCounter;   
                WakeByAddressAll(&activeCounter); 
            }
        }

        auto count = static_cast<DWORD>(handles.size());
                        
        auto result = WaitForMultipleObjects(count,
                        handles.data()->get(), FALSE, INFINITE);
        auto index = result - WAIT_OBJECT_0;
        if (index == count - 1) {
            // the list changed. Loop back to update.
            update = true;
            continue;
        } else if (index < count - 1) {
            actions[index]();
        } else {
            // deal with unexpected result
        }
    }
}

void change_handle_list()
{
    DWORD value;
    {
        std::lock_guard guard(desiredMutex);
        ⟦ make changes to desiredHandles and desiredActions ⟧
        value = ++desiredCounter;
        SetEvent(changed.get());
    }
    wait_until_active(value);
}

The pattern is that after the background thread makes the desired changes, they increment the desiredCounter and signal the event. It’s okay if multiple threads make changes before the waiting thread wakes up. The changes simply accumulate, and the event just stays signaled. Each background thread then waits for the waiting thread to process the change.

On the waiting side, we process changes as usual, but we also publish our current change counter if it has changed, to let the background threads know that we made some progress. Eventually, we will make enough progress that all of the pending changes have been processed, and the last ackground thread will be released from wait_until_active.

¹ You’ll run into problems if the counter increments 2 billion times without the worker thread noticing. At a thousand increments per second, that’ll last you a month. I figure that if you have a worker thread that is unresponsible for that long, then you have bigger problems. But you can avoid even that problem by switching to a 64-bit integer, so that the overflow won’t happen before the sun is expected to turn into a red giant.

² The holdouts would be compilers for systems that are not two’s-complement.

The post How do you add or remove a handle from an active <CODE>Wait­For­Multiple­Objects</CODE>?, part 2 appeared first on The Old New Thing.

Read the whole story
Share this story
Delete

“The problem is Sam Altman”: OpenAI insiders don’t trust CEO

1 Share

On the same day that OpenAI released policy recommendations to ensure that AI benefits humanity if superintelligence is ever achieved, The New Yorker dropped a massive investigation into whether CEO Sam Altman can be trusted to actually follow through on OpenAI's biggest promises.

Parsing the publications side by side can be disorienting.

On the one hand, OpenAI said it plans to push for policies to "keep people first" as AI starts "outperforming the smartest humans even when they are assisted by AI." To achieve this, the company vows to remain "clear-eyed" and transparent about risks, which it acknowledged includes monitoring for extreme scenarios like AI systems evading human control or governments deploying AI to undermine democracy. Without proper mitigation of such risks, "people will be harmed," OpenAI warned, before describing how the company could be trusted to advocate for a future where achieving superintelligence means a "higher quality of life for all."

On the other hand, The New Yorker interviewed more than 100 people familiar with how Altman conducts business. The publication also reviewed internal memos and interviewed Altman more than 12 times. The resulting story provides a lengthy counterpoint explaining why the public may struggle to trust OpenAI's CEO to "control the future" of AI, no matter how rosy the company's vision may appear.

Overall, insiders painted Altman as a people-pleaser who tells others what they want to hear while questing for power in an alleged bid to always put himself first. As one board member summed up Altman, he has "two traits that are almost never seen in the same person. The first is a strong desire to please people, to be liked in any given interaction. The second is almost a sociopathic lack of concern for the consequences that may come from deceiving someone."

While The New Yorker found no "smoking gun," its reporters reviewed messages from OpenAI's former chief scientist, Ilya Sutskever, and former research head, Dario Amodei, that documented "an accumulation of alleged deceptions and manipulations." Many of the incidents could be shrugged off individually, but when taken together, both men concluded that Altman was not fostering a safe environment for advanced AI, The New Yorker reported.

"The problem with OpenAI," Amodei wrote, "is Sam himself."

OpenAI's worried public is souring on AI

Altman either disputed claims in the story or else claimed to have forgotten about certain events. He also attributed some of his shifting narratives to the changing landscape of AI and admitted that he's been conflict-avoidant in the past.

But his seeming contradictions are getting harder to ignore as scrutiny of OpenAI intensifies amid growing government reliance on its models and lawsuits that label its tech as unsafe.

Perhaps most visibly to the public, Altman has recently shifted away from positioning OpenAI as a sort of savior blocking AI doomsday scenarios, instead adopting a "tone" of "ebullient optimism," The New Yorker reported.

The policy recommendations echo this at times. Discussing the recommendations—which include experimenting with shorter work weeks and creating a public wealth fund to share AI profits—OpenAI's chief global affairs officer, Chris Lehane, confirmed to The Wall Street Journal that the company is urgently concerned about negative public opinions about AI. While announcing their big ideas to spare humanity from AI dangers, OpenAI also promoted "a pilot program of fellowships and focused research grants of up to $100,000 and up to $1 million in API credits for work that builds on these and related policy ideas."

However, The New Yorker's report makes it easier to question whether the recommendations were rolled out to distract from mounting public fears about child safety, job displacement, or energy-guzzling data centers. One recent Harvard/MIT poll found that Americans' biggest concern is that powering AI will hurt their quality of life, Axios reported. Ultimately, these concerns might sway votes for Democrats and Republicans ahead of the midterm elections, the WSJ noted, as data center moratoriums that could slow AI advancement are gaining traction.

For Altman and his company, getting the public to buy into their vision of AI at this critical juncture likely feels essential, since a loss of Republican /control of Congress could pave the way for stricter AI safety laws that The New Yorker noted that Altman has privately lobbied against.

Without trust in Altman, it's likely a much harder sell to convince the public that OpenAI isn't simply saying whatever it will take to entrench its own dominance, the New Yorker suggested.

What exactly is OpenAI pitching?

"We don’t have all, or even most of the answers," OpenAI said. Instead, the company characterized its "industrial policy for the intelligence age" as "initial ideas for an industrial policy agenda to keep people first during the transition to superintelligence."

Calling for "common-sense" regulations and a public-private partnership to quickly iterate on successes, OpenAI pitched "ambitious" policy ideas to ensure that everyone can access AI and profit from it. Its bushy-tailed vision acknowledged that it hopes to achieve what society never did: guarantee Internet access and ensure AI is "fairly deployed" across the US, with everyone trained to use it.

Worker protections are a focus of OpenAI's plan. Recommendations included involving workers in discussions on how AI systems work to improve productivity and make workplaces safer, as well as on how to "set clear limits on harmful uses of AI." OpenAI also suggested creating a tax on automated labor that could be used to fund core programs like Social Security, Medicaid, SNAP, and housing assistance as companies rely less on human labor. Among other enticing ideas was a plan to "incentivize employers and unions to run time-bound 32-hour/four-day workweek pilots with no loss in pay that hold output and service levels constant, then convert reclaimed hours into a permanent shorter week, bankable paid time off, or both."

Additionally, OpenAI proposed a "public wealth fund" that "provides every citizen—including those not invested in financial markets—with a stake in AI-driven economic growth."

"Returns from the Fund could be distributed directly to citizens, allowing more people to participate directly in the upside of AI-driven growth, regardless of their starting wealth or access to capital," OpenAI said.

As AI takes on more tasks, humans can gravitate toward care-centric work, OpenAI suggested, recommending policy ideas to help displaced workers get training to work in health care, elderly care, daycare, or community service settings. To ensure people are attracted to those roles—historically undervalued as women's work—OpenAI suggested initiatives to help society recognize that caregiving is "economically valuable work."

Human workers will also be needed to use AI to accelerate scientific advancements, OpenAI said.

However, all these public benefits that OpenAI promises can only be realized if we build a "resilient society" that can quickly respond to risky implementations and "keep AI safe, governable, and aligned with democratic values," the company said.

That aspect of OpenAI's vision requires firms like OpenAI to develop safety systems, among other efforts, that will help improve public trust in AI. And we should trust those systems will work and only interfere with these firms when actual dangers are looming, OpenAI seems to suggest.

"As we progress toward superintelligence, there may come a point where a narrow set of highly capable models—particularly those that could materially advance chemical, biological, radiological, nuclear, or cyber risks—require stronger controls," OpenAI said.

When that day arrives, OpenAI opined, there should be a global network in place to communicate emerging risks. However, only the firms with the most advanced models should be subjected to rigorous audits, so that smaller firms can still compete. That's the path to ensure no firm's dominant position can be abused to unfairly shut down rivals or weaken democratic values, OpenAI said, while insisting that public input is vital to AI's success.

Altman has previously persuaded "a tech-skeptical public that their priorities, even when mutually exclusive, are also his priorities," The New Yorker reported. But for the public, which is already reporting alleged harms from OpenAI models, it might be getting harder to entertain lofty ideas from a company that is led by "the greatest pitchman of his generation," The New Yorker reported.

One OpenAI researcher told The New Yorker that Altman's promises can sometimes seem like a stopgap to overcome criticism until he reaches the next benchmark. When it comes to superintelligence, some optimistic experts think it could take two years, which is longer than Elon Musk stayed at OpenAI before famously criticizing Altman's leadership and leaving to start his own AI firm.

Altman "sets up structures that, on paper, constrain him in the future," the OpenAI researcher told The New Yorker. "But then, when the future comes and it comes time to be constrained, he does away with whatever the structure was."

Read full article

Comments



Read the whole story
Share this story
Delete

Testing suggests Google's AI Overviews tell millions of lies per hour

1 Share

Looking up information on Google today means confronting AI Overviews, the Gemini-powered search robot that appears at the top of the results page. AI Overviews has had a rough time since its 2024 launch, attracting user ire over its scattershot accuracy, but it's getting better and usually provides the right answer. That's a low bar, though. A new analysis from The New York Times attempted to assess the accuracy of AI Overviews, finding it's right 90 percent of the time. The flip side is that 1 in 10 AI answers is wrong, and for Google, that means hundreds of thousands of lies going out every minute of the day.

The Times conducted this analysis with the help of a startup called Oumi, which itself is deeply involved in developing AI models. The company used AI tools to probe AI Overviews with the SimpleQA evaluation, a common test to rank the factuality of generative models like Gemini. Released by OpenAI in 2024, SimpleQA is essentially a list of more than 4,000 questions with verifiable answers that can be fed into an AI.

Oumi began running its test last year when Gemini 2.5 was still the company's best model. At the time, the benchmark showed an 85 percent accuracy rate. When the test was rerun following the Gemini 3 update, AI Overviews answered 91 percent of the questions correctly. If you extrapolate this miss rate out to all Google searches, AI Overviews is generating tens of millions of incorrect answers per day.

The report includes several examples of where AI Overviews went wrong. When asked for the date on which Bob Marley's former home became a museum, AI Overviews cited three pages, two of which didn't discuss the date at all. The final one, Wikipedia, listed two contradictory years, and AI Overviews confidently chose the wrong one. The benchmark also prompts models to produce the date on which Yo Yo Ma was inducted into the classical music hall of fame. While AI Overviews cited the organization's website that listed Ma's induction, it claimed there's no such thing as the Classical Music Hall of Fame.

Google doesn't much like this test. Google spokesperson Ned Adriance tells the Times that Google believes SimpleQA contains incorrect information. Its model evaluations often rely on a similar test called SimpleQA Verified, which uses a smaller set of questions that have been more thoroughly vetted. "This study has serious holes," Adriance told the Times. "It doesn’t reflect what people are actually searching on Google."

Benchmark problems

Evaluating new AI models sometimes feels more like art than science, which is part of the problem. Every company has its own preferred way of demonstrating what a model can do, and the non-deterministic nature of gen AI can make it hard to verify anything. These robots can get a factual question right and then completely miss it if you rerun the query immediately. Oumi even uses AI tools to run its assessments, and those models can hallucinate, too.

The other wrinkle is that AI Overviews isn't a single monolithic model. Google told Ars Technica that it uses the "right model" for each query. While AI Overviews would get the best answers from always running Gemini 3.1 Pro, that's slow and expensive. To load things promptly on a search page, the overview uses faster Gemini Flash models when possible (which appears to be most of the time).

Google's response to this report is telling. In the realm of AI factuality, 9 out of 10 isn't even that bad. Google has recently published benchmarks for new model releases featuring measurements of factuality in the range of 60 to 80 percent—these tests are run without tools like web search. Grounding an AI with more data, like the wealth of human knowledge on the Internet, does make it more accurate than the naked model itself. However, the truth is in the blue links somewhere, and AI Overviews encourages people to accept its sometimes inaccurate summaries instead of checking those sources manually.

While Google says the Times' results don't match what people see, you have to wonder how the company could even know that. You've probably seen mistakes in AI Overviews—we all have because that's just how generative AI works. As Google itself reminds you at the bottom of every overview: "AI can make mistakes, so double-check responses."

Read full article

Comments



Read the whole story
Share this story
Delete
Next Page of Stories